Pii: S0306-4573(98)00040-5
نویسنده
چکیده
In this paper, we present a comparison of collocation-based similarity measures: Jaccard, Dice and Cosine similarity measures for the proper selection of additional search terms in query expansion. In addition, we consider two more similarity measures: average conditional probability (ACP) and normalized mutual information (NMI). ACP is the mean value of two conditional probabilities between a query term and an additional search term. NMI is a normalized value of the two terms' mutual information. All these similarity measures are the functions of any two terms' frequencies and the collocation frequency, but are dierent in the methods of measurement. The selected measure changes the order of additional search terms and their weights, hence has a strong in ̄uence on the retrieval performance. In our experiments of query expansion using these ®ve similarity measures, the additional search terms of Jaccard, Dice and Cosine similarity measures include more frequent terms with lower similarity values than ACP or NMI. In overall assessments of query expansion, the Jaccard, Dice and Cosine similarity measures are better than ACP and NMI in terms of retrieval eectiveness, whereas, NMI and ACP are better in terms of execution eciency. # 1999 Elsevier Science Ltd. All rights reserved.
منابع مشابه
Browsing is a collaborative process
– Interfaces to databases have traditionally been designed as single-user systems that hide other users and their activity. This paper aims to show that collaboration is an important aspect of searching online information stores that requires explicit computerised support. The claim is made that a truly user-centred system must acknowledge and support collaborative interactions between users. C...
متن کاملAutomatic performance evaluation of Web search engines
Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of human relevance judgments involved. However, both for business enterprises and people it is important to know the most effective Web search engines, since such search engines help their users find higher number of relevant Web pages with less effort. Furthermore, this information can be used ...
متن کاملCrossover Improvement for the Genetic Algorithm in Information Retrieval
Genetic algorithms (GAs) search for good solutions to a problem by operations inspired from the natural selection of living beings. Among their many uses, we can count information retrieval (IR). In this field, the aim of the GA is to help an IR system to find, in a huge documents text collection, a good reply to a query expressed by the user. The analysis of phenomena seen during the implement...
متن کاملVisualising Semantic Spaces and Author Co-Citation Networks in Digital Libraries
This paper describes the development and application of visualisation techniques for users to access and explore information in a digital library eectively and intuitively. Salient semantic structures and citation patterns are extracted from several collections of documents, including the ACM SIGCHI Conference Proceedings (1995±1997) and ACM Hypertext Conference Proceedings (1987±1998), using ...
متن کامل